Conversation
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…eval_strategy.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@doc/workloads/vllm.rst`:
- Around line 92-93: The disaggregated TOML snippet is ambiguous because
[extra_env_vars] is shown at top level without context; update the example in
vllm.rst to show a complete TOML block including the top-level keys (e.g., name,
test_template_name, executor/test settings) and explicitly show whether
[extra_env_vars] is a sibling of [cmd_args] or nested under it (for example,
include [cmd_args] with its keys then a separate [extra_env_vars] section), so
readers can unambiguously see the intended section hierarchy and placement of
CUDA_VISIBLE_DEVICES.
- Line 73: Replace the awkward phrase "from less priority to more priority" in
the sentence "The number of GPUs can be controlled using the options below,
listed from less priority to more priority:" with a clearer alternative such as
"from lowest to highest priority" or "in order of increasing priority" so the
sentence reads e.g. "The number of GPUs can be controlled using the options
below, listed from lowest to highest priority:"; update the string where that
sentence appears in the vllm.rst documentation.
In `@src/cloudai/workloads/vllm/report_generation_strategy.py`:
- Around line 47-48: The use of functools.cache on parse_vllm_bench_output
causes indefinite memoization by Path and can return stale results if the file
changes; update the function to either remove the `@cache` decorator or change the
cache key to include the file's modification state (e.g., use an explicit
memoization keyed by (res_file, res_file.stat().st_mtime) or a TTL/lru cache) so
cached entries are invalidated when the file is updated; locate
parse_vllm_bench_output and replace the `@cache` usage with one of these
strategies to ensure fresh results for changed files.
- Around line 53-58: The except clause in the block that opens res_file and
calls VLLMBenchReport.model_validate(data) is redundant because
json.JSONDecodeError is already an Exception; update the except clause from
"except (json.JSONDecodeError, Exception):" to a single "except Exception:" so
it no longer lists duplicate exception types while preserving the current error
handling behavior.
In `@src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`:
- Around line 258-268: The script launches the proxy in background (proxy_cmd,
PROXY_PID) and immediately starts the benchmark (bench_cmd), causing potential
failures if the proxy isn't ready; update the generated shell to wait for proxy
readiness by invoking the existing wait_for_health helper (or a short sleep)
against the proxy endpoint after starting the proxy and before running
bench_cmd, ensuring the health check references the same proxy port/URL used by
proxy_cmd and still retains PROXY_PID handling.
In `@tests/slurm_command_gen_strategy/test_vllm_slurm_command_gen_strategy.py`:
- Around line 55-60: The fixture vllm_disagg_tr mutates the shared vllm fixture;
instead create a fresh VllmTestDefinition instance (or deep copy the existing
vllm) inside vllm_disagg_tr, set its extra_env_vars =
{"CUDA_VISIBLE_DEVICES":"0,1,2,3"} and its cmd_args.prefill = VllmArgs() on that
new instance, then pass the new instance to TestRun(test=...) so vllm remains
unchanged; reference the vllm_disagg_tr fixture, VllmTestDefinition (or use
copy.deepcopy(vllm)), TestRun, and VllmArgs when making the change.
tests/slurm_command_gen_strategy/test_vllm_slurm_command_gen_strategy.py
Show resolved
Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Greptile OverviewGreptile SummaryThis PR adds comprehensive vLLM support to CloudAI with both aggregated (single instance) and disaggregated (separate prefill/decode) modes for single-node execution. Key Changes:
Critical Issues:
Architecture: Confidence Score: 2/5
Important Files Changed
|
Additional Comments (2)
Consider returning
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/cloudai/workloads/vllm/report_generation_strategy.py`:
- Around line 31-45: VLLMBenchReport defines std_ttft_ms and std_tpot_ms but
they aren't shown in the generated report table; either remove these fields or
add them to the displayed metrics—update the report-generation code that
currently renders mean_ttft_ms/median_ttft_ms/p99_ttft_ms and
mean_tpot_ms/median_tpot_ms/p99_tpot_ms to also include std_ttft_ms and
std_tpot_ms (add headers, column values and formatting consistent with the other
stats), or delete std_ttft_ms/std_tpot_ms from VLLMBenchReport if intentionally
unused; ensure any serialization/deserialization and tests reference the updated
schema.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/cloudai/workloads/vllm/report_generation_strategy.py`:
- Around line 64-65: The cache key issue comes from passing potentially
non-normalized Path objects into parse_vllm_bench_output from
can_handle_directory; update can_handle_directory to resolve the path (e.g.,
call self.test_run.output_path.resolve() or resolve() on the
VLLM_BENCH_JSON_FILE path) before passing it to parse_vllm_bench_output so the
cached key is consistent with generate_report and other callers, and likewise
ensure any other call sites (like generate_report) also resolve the path before
invoking parse_vllm_bench_output to avoid inconsistent cache hits/misses.
Co-authored-by: Ivan Podkidyshev <raashicat@gmail.com>
Summary
Two modes are supported at the moment, single node only:
Test Plan
Additional Notes
–